Welcome to the ATFS (Alliance for Tropical Research Science) data
harmonization app!
1 Intro
The app is a tool meant to be used by 2 or more networks that are
planing on combining their data for a common analysis.
1.1 Profiles
The app relies on “Profiles” that indicate how the data is stored in
the file(s) provided: names of columns storing the DBH, the census ID,
the tree tag, units of measurements etc…
A profile is a .rds file that is downloaded via the app once all the
information about the data has been provided in the Headers and Units tab of the app.
One same profile can be uploaded as “input profile” in the Headers and Units tab, to speed up the
process once your network’s data has been profiled, and/or as “output
profile” in the Output format
tab, to transform other networks’ data into that profile.
Some networks have their profile stored within the app.
1.2 Getting your data
ready
The app only accepts CSV files.
It performs best if all the information that you want to share is
collated into one analytical file, so we recommend that you append your
species and plot information to your measurement information beforehand,
and upload that one bigger file into the app.
That said, you can decide to utilize the app to do exactly that.
There is no limit to the number of files you can upload but they all
need to connect to each other in one way or another, so that by a
stacking and/or merging them, it is possible to collate them down to one
file. We will get to this in more detail in a moment.
The app also relied on tidy
data, which means that every column is a variable, every row is an
observation and every cell is a single value. For example, a data set
with multiple column for the DBH measurement (e.g. DBH_2015, DBH_2020
etc…) is not a tidy data set. Instead, there should be a column for the
variable year (which, in our example, will take a value of
2015 or 2020), and a column for DBH. If your data is not in
a tidy format, the Tidy table tab
will help you reshape your data.
We recommend to run the app on your local machine (via R and RStudio)
if one of the following cases apply to you:
You have poor internet connection
You are working with large data files
You are familiar with the development of Shiny apps and would like
to troubleshoot any issues you may encounter yourself
To open the app in R, you will need to install the DataHarmonization
R package and launch Shiny with the following lines of code.
# install the R package
devtools::install_github("Alliance-for-Tropical-Forest-Science/DataHarmonization", build_vignettes = TRUE)
# run the app
shiny::runGitHub( "Alliance-for-Tropical-Forest-Science/DataHarmonization", subdir = "inst/app")
Note that you may need to install devtools package first
and that installing the DataHarmonization R package may ask you to
update a list packages.
You’ll want to re-install the package every once in a while,
to get the latest version of the app.
1.4.2 Running the app
online
If you don’t have R and RStudio and if your data is not too big, you
can choose to run the online version of the app by clicking on this link. Note
that online version may be lagging behind the GitHub version.
2 Interacting with the
app
Once the app is launched you can start interacting with it.
There are multiple tabs to go through. Some tabs will be skipped
automatically if they don’t apply to your situation and you may skip
others if you don’t need/want them.
When you land on a tab, always advance with an action button
(even if skipping) so your inputs are taken into account. You
may use the navigation panel to return to a previous tab but remember to
click on an action button to save your updated entries.
2.1 Upload your
file(s)
This tab starts with information that we already covered in the intro. The checklist is only a guideline to help you
getting ready, and you don’t actually need to check the boxes to keep
going.
The numbered tasks are the elements that you do need to complete to
be able to move forward.
Indicate how many tables you wish to upload
Indicate the finest level of measurement in your data:
Plot: if your data only consists of plot level measurements like
species richness, total basal area, total number of stems etc…
Species: if your data consists of species level measurements like
abundance, basal area etc… This does not prevent you from also uploading
plot level information that is stored in a separate file.
Tree: if your data consists of tree diameters, circumference,… and
you are only measuring the main stem of each tree. This does not prevent
you from also uploading plot and species level information if they are
stored in separate file(s).
Stem: if your data consists of stem diameters, circumference,… and
some stems belong to a same tree. This does not prevent you from
uploading plot and species level information if they are stored in a
separate file(s).
Again, even if you are uploading plot level information but have a
stem level data, you should upload that file along and indicate that
your level of measurement is “Stem”.
Upload you tables. You’ll have as many upload boxes as you
indicated needing in step 1. For each of them:
click on Browse... and navigate to the csv file you
want to upload.
Type a more meaningful name to replace the generic “Table1”,
“Table2” etc… This is particularly useful if you are uploading more than
one file.
Check on the right hand side that the columns and rows of your data
are rendering properly.
In the unlikely event that your tables are not rendering properly,
adjust the parameters (separator and header) by clicking on the little
gear icon
.
Click on SUBMIT to proceed to the next
step.
2.2 Stack tables
If you uploaded more than one table, you will be prompted to the
Stack tables tab, but this tab will be skipped if you only
uploaded one table.
You will need to stack 2 or more tables if you are collecting the
same information in multiple files. This can be the case if, for
example, you are keeping your measurements from different plots in
different files. Or you are keeping one file per census.
If you don’t need to stack tables, click on SKIP THIS
STEP.
It is important that the files you are stacking have the same
set of columns.
Select all the tables that need to be stacked
Click on STACK TABLES
Double check your newly created table looks ok
Click on GO TO MERGE to proceed to the next
step. (Note: if you are down to one table at this stage the button’s
label will change, so click SKIP MERGING SINCE ALL YOUR DATA IS
NOW STACKED).
2.3 Merge tables
If you uploaded more than one table and not all of them were stacked,
you will be prompted to the Merge tables tab, but this tab
will be skipped if you only uploaded one table, or if all your tables
were stacked.
At the end of this stage you have to be down to one table.
You need to use merging if, e.g., your species or your plot
information is stored in a different table than your measurement table,
and there is at least one “key” column that you can use to connect the
tables together.
In Merge this table, select the main measurement
table (the one onto which you want to merge extra information into, from
other tables). Note that this may be your now stacked table.
In And this table, Select the table that you
want to bring information from.
Click on both blue
arrows.
In the two dropdown menus about the ‘KEY column(s)’, select the
column(s) that allow to connect the to tables together.Select
all columns that are common between tables, otherwise columns will be
repeated in the output, with extension ‘.y’ in the name of the second
table.
click on ‘MERGE TABLES’.
If you are still not down to one table, another box will appear.
Repeat 1-5 with the remaining tables.
click on ‘GO TO TIDY’.
2.4 Tidy table
At this stage, we want to make sure your data has one row per
observation and one variable per column.
If you collected the same type of information in several columns
(e.g. you added a column each time you visited a tree, or for each stem
of the tree etc…), you need to “tidy” your table (also called
wide-to-long reshaping).
In the top-most box, use the radio-buttons to indicate the reason
you added new columns for a new observation.
The next set of boxes are pre-filled with our best guesses at the
columns that may contain the same variable (columns that have similar
names like dbh1, dbh2, or year1,
year2…). Our guess may be terrible. Your role is to:
Indicate the name of the new column that you which your variable
to be called (e.g. dbh) in the text box. Note that this
should start by a letter and have no space.
Select all the columns of your data that represent the variable
indicated in step a. (e.g dbh1 and dbh2) using
the drop-down menu.
Tick the little tick-box on the upper-left corner of the box, to
indicated that you do want to take into account what you
selected.
Repeat a-c for the next variable(s), e.g. you may need a box to
indicate year in the text box and year1 and
year2 in the drop-down menu. Don’t forget to tick the
tick-box for those variables too.
Click on ‘TIDY’
Click on ‘GO TO HEADERS’
2.5 Headers and
Units
Here, we want to know in what column some key information is
stored.
If…
A. … your data follows one of our pre-loaded
standards: You can select it to help filing out some of the
general information. But you should double check that accurately
describe your particular data set.
B. … you have already gone through this step and saved your
profile (.rds file): You can upload your profile. Double check
that the information is filled out properly.
C. …This is your first time on this page: You have
to go through all the drop-down menus:
column 1: Go through this column before indicating more information
in column 2. For each element, select the name of the field of your
data, if any, that corresponds the best to what is asked. Leave “none”
if none of your variable apply.
column 2: Complete the information of that column after you are done
with the information of column 1, because some field may
appear/disappear depending on what you entered in 1.
Once you are done, save your profile (a .rds file) so you won’t need
to go through all of this again (see B. above)
click on ‘APPLY CHANGES’ and read any warnings
that may popup, adjust your entries if possible (it is okay to ignore
warnings) and re-apply your changes. Save (or update if needed) your
profile (.rds file)
Check your new formated data looks ok. The headers and units are
now following ATFS’s standard. You can see what those are by clicking
the little
button.
click on ‘NEXT’
2.6 Codes
This stage is only required if you indicated column(s) for tree codes
in the Tree Measurement section of the previous tab.
The table shows the list of codes that are available in the column(s)
you indicated. If you intend to translate these codes to match the ones
of another profile (which you will be able to do at a later step), or
vice-versa, you need to fill this table out.
Once you are done with the table you can update your profile (by
downloading and overwriting your .rds file). If you have already saved
your profile after this step and used it to fill out the previous step,
you can click on the “Use your profile” button to automatically fill the
table.
There is a list of predefined definitions, which, if used by you and
your collaborator, will helps automatically translating your codes. But
if you can’t find a definition that matches yours, just type your
own.